[v5] Return a BatchEncoding dict from apply_chat_template by default by Rocketknight1 · Pull Request #41626 · huggingface/transformers

Rocketknight1 · 2025-10-15T15:00:46Z

Tokenizers return a BatchEncoding dict by default, but apply_chat_template doesn't. This is just an accident of how I wrote it originally, which we were stuck with for backward compatibility reasons. Ideally, I think apply_chat_template should return exactly the same format as tokenizers, since it also performs tokenization most of the time. It's now v5 time, so we can start making that happen 😅

This PR also updates tests, and removes very old test_tokenization_for_chat tests. These model-specific tests don't do anything useful anymore, since the apply_chat_template functionality is unified across tokenizers; they're mostly a legacy leftover from when model classes used to need custom chat tokenization functions.

HuggingFaceDocBuilderDev · 2025-10-15T15:21:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-10-16T14:25:35Z

It's a v5 breaking change so cc @LysandreJik @Cyrilvallez @ArthurZucker for review to make sure you're okay with it

qgallouedec · 2025-10-22T16:33:19Z

tests/models/blenderbot/test_tokenization_blenderbot.py

        assert self.rust_tokenizer_3b([" Sam", "Sam"]).input_ids == [[5502, 2], [5502, 2]]
-
-    @require_jinja
-    def test_tokenization_for_chat(self):


Out of curiosity, why do you remove this test?

It's extremely old - it's not related to this PR really, but these tests come from before chat templates, and we just patched them to support chat templates after chat templates were added. They only exist for a few models, and I think we don't want to keep them, because it's not clear what they test that the main chat template tests don't.

qgallouedec · 2025-10-22T16:36:08Z

tests/test_tokenization_mistral_common.py

            ]

-            output = self.tokenizer.apply_chat_template(conversation, tokenize=True)
+            output = self.tokenizer.apply_chat_template(conversation, tokenize=True).input_ids


nit for consistency, since tokenize=True by default, I guess you can remove it

qgallouedec · 2025-10-22T16:39:38Z

tests/tokenization/test_tokenization_utils.py

        ]
        with self.assertRaises(ValueError):
-            tokenizer.encode_message_with_chat_template(conversation[0], add_generation_prompt=True)
+            tokenizer.encode_message_with_chat_template(conversation[0], add_generation_prompt=True, return_dict=False)


Nit again: Not sure if this change is needed

…nderlying tokenizer

…useful

github-actions · 2025-10-31T13:31:49Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: blenderbot, bloom, cohere, gemma, gpt2, gpt_sw3, llama, voxtral

LysandreJik

Thank you

awni · 2025-12-16T21:41:38Z

This change is a bit unfortunate. Seems like an unnecessary API break which is going to cause a headache for a lot of people.

You can see all the places we use this in mlx-lm and also all the models uploaded to Hugging Face Hub have code snippets which include this (see e.g https://huggingface.co/mlx-community/mistralai_Devstral-Small-2-24B-Instruct-2512-MLX-6Bit).

mlx_lm/tuner/datasets.py:        tokens = self.tokenizer.apply_chat_template(messages, tools=tools)
mlx_lm/tuner/datasets.py:                self.tokenizer.apply_chat_template(
mlx_lm/tuner/datasets.py:        tokens = self.tokenizer.apply_chat_template(messages, tools=tools)
mlx_lm/tuner/datasets.py:                self.tokenizer.apply_chat_template(
mlx_lm/server.py:    Convert message content to a format suitable for `apply_chat_template`.
mlx_lm/server.py:                return tokenizer.apply_chat_template(
mlx_lm/server.py:        help="""A JSON formatted string of arguments for the tokenizer's apply_chat_template, e.g. '{"enable_thinking":false}'""",
mlx_lm/generate.py:        help="Additional config for `apply_chat_template`. Should be a dictionary of"
mlx_lm/generate.py:        prompt = tokenizer.apply_chat_template(
mlx_lm/generate.py:            test_prompt = tokenizer.apply_chat_template(
mlx_lm/cache_prompt.py:        prompt = tokenizer.apply_chat_template(
mlx_lm/chat.py:        prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/utils.py:            prompt = tokenizer.apply_chat_template(
mlx_lm/examples/pipeline_generate.py:    prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/examples/generate_response.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/batch_generate_response.py:    tokenizer.apply_chat_template(
mlx_lm/examples/batch_generate_response.py:    tokenizer.apply_chat_template(
mlx_lm/examples/tool_use.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/tool_use.py:prompt = tokenizer.apply_chat_template(
mlx_lm/examples/chat.py:prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/examples/chat.py:prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
mlx_lm/evaluate.py:    def apply_chat_template(self, chat_history, add_generation_prompt=True) -> str:
mlx_lm/evaluate.py:        return self.tokenizer.apply_chat_template(
mlx_lm/evaluate.py:    return apply_chat_template
mlx_lm/evaluate.py:    apply_chat_template = chat_template_fn()
mlx_lm/evaluate.py:        apply_chat_template, e.g. '{"enable_thinking":false}'""",
mlx_lm/evaluate.py:        use_chat_template=args.apply_chat_template,
mlx_lm/evaluate.py:    MLXLM.apply_chat_template = chat_template_fn(**args.chat_template_args)
mlx_lm/evaluate.py:        apply_chat_template=lm.use_chat_template,
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:        prompt = self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:            self.tokenizer.apply_chat_template(
tests/test_generate.py:                self.tokenizer.apply_chat_template(
tests/test_generate.py:                self.tokenizer.apply_chat_template(
tests/test_chat.py:        mock_tokenizer.apply_chat_template.return_value = "processed_prompt"
tests/test_chat.py:        # Verify that apply_chat_template was called with system prompt
tests/test_chat.py:        mock_tokenizer.apply_chat_template.assert_called()
tests/test_chat.py:        call_args = mock_tokenizer.apply_chat_template.call_args[0][
tests/test_chat.py:        mock_tokenizer.apply_chat_template.return_value = "processed_prompt"
tests/test_chat.py:        # Verify that apply_chat_template was called without system prompt
tests/test_chat.py:        mock_tokenizer.apply_chat_template.assert_called()
tests/test_chat.py:        call_args = mock_tokenizer.apply_chat_template.call_args[0][

Rocketknight1 · 2025-12-17T13:54:24Z

It is, and I'm sorry - but the v5 update is a chance to standardize a lot of things in our API, which will hopefully make usability better in the long term. For now, you can explicitly set return_dict=False to get the old behaviour and return identical results from both v4 and v5.

Cyrilvallez · 2025-12-18T09:42:18Z

Yes, as @Rocketknight1 said, we believe it makes much more sense for most users than it breaks current code... We unfortunately cannot go forward without breaking a few eggs... But we always provide a solution to keep the exact same behavior as before!

…uggingface#41626) * Flip the default return type for `apply_chat_template` to match the underlying tokenizer * Remove test_tokenization_for_chat tests, which no longer do anything useful * Remove test_tokenization_for_chat tests, which no longer do anything useful * Fix test_encode_message tests * Fix test_encode_message tests * Return dicts for Processor too * Fix mistral-common tests * Catch one of the processors too * revert test bug! * nit fix * nit fix

Rocketknight1 marked this pull request as ready for review October 15, 2025 17:06

Rocketknight1 force-pushed the v5_chat_template_return_type branch from 0f4f200 to 9330959 Compare October 21, 2025 15:48

qgallouedec approved these changes Oct 22, 2025

View reviewed changes

Rocketknight1 force-pushed the v5_chat_template_return_type branch from b729fb7 to c452142 Compare October 30, 2025 16:59

Rocketknight1 added the for_v5? label Oct 30, 2025

Rocketknight1 mentioned this pull request Oct 30, 2025

Welcome v5 #40822

Open

Rocketknight1 force-pushed the v5_chat_template_return_type branch from 2acbd45 to abf7158 Compare October 31, 2025 12:02

Rocketknight1 added 11 commits October 31, 2025 13:30

Flip the default return type for apply_chat_template to match the u…

61c3617

…nderlying tokenizer

Remove test_tokenization_for_chat tests, which no longer do anything …

edaacfd

…useful

Remove test_tokenization_for_chat tests, which no longer do anything …

5fef4b9

…useful

Fix test_encode_message tests

2c2c27c

Fix test_encode_message tests

2e06861

Return dicts for Processor too

b2008c7

Fix mistral-common tests

edc7980

Catch one of the processors too

5e2f691

revert test bug!

cd109e5

nit fix

054743b

nit fix

137015e

Rocketknight1 force-pushed the v5_chat_template_return_type branch from abf7158 to 137015e Compare October 31, 2025 13:30

LysandreJik approved these changes Oct 31, 2025

View reviewed changes

Rocketknight1 merged commit 5f8d02f into main Oct 31, 2025
24 checks passed

Rocketknight1 deleted the v5_chat_template_return_type branch October 31, 2025 13:50

This was referenced Nov 4, 2025

CI fails with dev dependencies: RuntimeError: Could not infer dtype of dict huggingface/trl#4447

Closed

Update tokenizer apply_chat_template with return_dict=True default huggingface/trl#4448

Merged

This was referenced Nov 4, 2025

Fix continuous batching tests #42012

Merged

[bug] tokenizer apply_chat_template default param return_dict's value changed #42010

Closed

itazap mentioned this pull request Nov 27, 2025

rm slow tokenizers #40936

Merged

Rocketknight1 mentioned this pull request Dec 2, 2025

[V5] Return a BatchEncoding dict from apply_chat_template by default again #42567

Merged

qgallouedec mentioned this pull request Jan 20, 2026

Fix SFT training for prompt-completion type and transformers v5 huggingface/trl#4880

Merged

Conversation

Rocketknight1 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2025

Uh oh!

Rocketknight1 commented Oct 16, 2025

Uh oh!

qgallouedec Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

awni commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Dec 17, 2025

Uh oh!

Cyrilvallez commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Rocketknight1 commented Oct 15, 2025 •

edited

Loading

awni commented Dec 16, 2025 •

edited

Loading